58 research outputs found
Tolerating Correlated Failures in Massively Parallel Stream Processing Engines
Fault-tolerance techniques for stream processing engines can be categorized
into passive and active approaches. A typical passive approach periodically
checkpoints a processing task's runtime states and can recover a failed task by
restoring its runtime state using its latest checkpoint. On the other hand, an
active approach usually employs backup nodes to run replicated tasks. Upon
failure, the active replica can take over the processing of the failed task
with minimal latency. However, both approaches have their own inadequacies in
Massively Parallel Stream Processing Engines (MPSPE). The passive approach
incurs a long recovery latency especially when a number of correlated nodes
fail simultaneously, while the active approach requires extra replication
resources. In this paper, we propose a new fault-tolerance framework, which is
Passive and Partially Active (PPA). In a PPA scheme, the passive approach is
applied to all tasks while only a selected set of tasks will be actively
replicated. The number of actively replicated tasks depends on the available
resources. If tasks without active replicas fail, tentative outputs will be
generated before the completion of the recovery process. We also propose
effective and efficient algorithms to optimize a partially active replication
plan to maximize the quality of tentative outputs. We implemented PPA on top of
Storm, an open-source MPSPE and conducted extensive experiments using both real
and synthetic datasets to verify the effectiveness of our approach
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
Load balancing, operator instance collocations and horizontal scaling are
critical issues in Parallel Stream Processing Engines to achieve low data
processing latency, optimized cluster utilization and minimized communication
cost respectively. In previous work, these issues are typically tackled
separately and independently. We argue that these problems are tightly coupled
in the sense that they all need to determine the allocations of workloads and
migrate computational states at runtime. Optimizing them independently would
result in suboptimal solutions. Therefore, in this paper, we investigate how
these three issues can be modeled as one integrated optimization problem. In
particular, we first consider jobs where workload allocations have little
effect on the communication cost, and model the problem of load balance as a
Mixed-Integer Linear Program. Afterwards, we present an extended solution
called ALBIC, which support general jobs. We implement the proposed techniques
on top of Apache Storm, an open-source Parallel Stream Processing Engine. The
extensive experimental results over both synthetic and real datasets show that
our techniques clearly outperform existing approaches
Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach
In a distributed stream processing system, streaming data are continuously disseminated from the sources to the distributed processing servers. To enhance the dissemination efficiency, these servers are typically organized into one or more dissemination trees. In this paper, we focus on the problem of constructing dissemination trees to minimize the average loss of fidelity of the system. We observe that existing heuristic-based approaches can only explore a limited solution space and hence may lead to sub-optimal solutions. On the contrary, we propose an adaptive and cost-based approach. Our cost model takes into account both the processing cost and the communication cost. Furthermore, as a distributed stream processing system is vulnerable to inaccurate statistics, runtime fluctuations of data characteristics, server workloads, and network conditions, we have designed our scheme to be adaptive to these situations: an operational dissemination tree may be incrementally transformed to a more cost-effective one. Our adaptive strategy employs distributed decisions made by the distributed servers independently based on localized statistics collected by each server at runtime. For a relatively static environment, we also propose two static tree construction algorithms relying on apriori system statistics. These static trees can also be used as initial trees in a dynamic environment. We apply our schemes to both single- and multi-object dissemination. Our extensive performance study shows that the adaptive mechanisms are effective in a dynamic context and the proposed static tree construction algorithms perform close to optimal in a static environmen
Data Management in Microservices: State of the Practice, Challenges, and Research Directions
We are recently witnessing an increased adoption of microservice
architectures by the industry for achieving scalability by functional
decomposition, fault-tolerance by deployment of small and independent services,
and polyglot persistence by the adoption of different database technologies
specific to the needs of each service. Despite the accelerating industrial
adoption and the extensive research on microservices, there is a lack of
thorough investigation on the state of the practice and the major challenges
faced by practitioners with regard to data management. To bridge this gap, this
paper presents a detailed investigation of data management in microservices.
Our exploratory study is based on the following methodology: we conducted a
systematic literature review of articles reporting the adoption of
microservices in industry, where more than 300 articles were filtered down to
11 representative studies; we analyzed a set of 9 popular open-source
microservice-based applications, selected out of more than 20 open-source
projects; furthermore, to strengthen our evidence, we conducted an online
survey that we then used to cross-validate the findings of the previous steps
with the perceptions and experiences of over 120 practitioners and researchers.
Through this process, we were able to categorize the state of practice and
reveal several principled challenges that cannot be solved by software
engineering practices, but rather need system-level support to alleviate the
burden of practitioners. Based on the observations we also identified a series
of research directions to achieve this goal. Fundamentally, novel database
systems and data management tools that support isolation for microservices,
which include fault isolation, performance isolation, data ownership, and
independent schema evolution across microservices must be built to address the
needs of this growing architectural style
Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests
The vast amounts of data collected in various domains pose great challenges
to modern data exploration and analysis. To find "interesting" objects in large
databases, users typically define a query using positive and negative example
objects and train a classification model to identify the objects of interest in
the entire data catalog. However, this approach requires a scan of all the data
to apply the classification model to each instance in the data catalog, making
this method prohibitively expensive to be employed in large-scale databases
serving many users and queries interactively. In this work, we propose a novel
framework for such search-by-classification scenarios that allows users to
interactively search for target objects by specifying queries through a small
set of positive and negative examples. Unlike previous approaches, our
framework can rapidly answer such queries at low cost without scanning the
entire database. Our framework is based on an index-aware construction scheme
for decision trees and random forests that transforms the inference phase of
these classification models into a set of range queries, which in turn can be
efficiently executed by leveraging multidimensional indexing structures. Our
experiments show that queries over large data catalogs with hundreds of
millions of objects can be processed in a few seconds using a single server,
compared to hours needed by classical scanning-based approaches
- …